Posts with tag Data Visualization
Back to all postsI’ve been doing a lot of my data exploration lately on Observable Notebooks, which is–sort of–a Javascript version of Jupyter notebooks that automatically runs all the code inline. Married with Vega-Lite or D3, it provides a way to make data exploration editable and shareable in a way that R and python data code simply can’t be; and since it’s all HTML, you can do more interesting things.
As I often do, I’m going to pull away from various forms of Internet reading/engagement through Lent. This year, this brings to mind one of my favorite stray observations about digital libraries that I’ve never posted anywhere.
As part of the 2016 Republican Primary, Jeb! Bush released a website enabling exploration of e-mails related to his official accounts as governor of Florida in the early 2000s. This whole sentence has an antiquity to it; the idea of pre-emptive disclosure (in large part to contrast with his presumed general election opponent, Hilly Clinton) seems hopelessly antique. And at the time, it was critized for accidentally disclosing all sort of personal information, both stories and Social Security Numbers. It did not make Jeb! president. Anyhow, back then I downloaded Jeb!’s e-mails–and Hillary’s–to think about what sort of stuff historians will do with these records in the future.
As part of the Creating Data project, I’ve been doing a lot of work lately with interactive scatterplots. The most interesting of them is this one about the full Hathi collection. But I’ve posted a few more I want to link to from here:
I have a new article on dimensionality reduction on massive digital libraries this month. Because it’s a technique with applications beyond the specific tasks outlined there, I want to link to a few things here.
The article in Cultural Analytics.
A visualization of 13 million books from the Hathi Trust in Creating Data.
Instructions for best using those features for your own projects in Creating Data.
Here’s a very technical, but kind of fun, problem: what’s the optimal order for a list of geographical elements, like the states of the USA?
If you’re just here from the future, and don’t care about the details, here’s my favorite answer right now:
["HI","AK","WA","OR","CA","AZ","NM","CO","WY","UT","NV","ID","MT","ND","SD","NE","KS","IA","MN","MO","OH","MI","IN","IL","WI","OK","AR","TX","LA","MS","AL","TN","KY","GA","FL","SC","WV","NC","VA","MD","DE","PA","NJ","NY","CT","RI","MA","NH","VT","ME"]